Single cell seq after sorting for PhenoID
sample1 = Neurons1 sample2 = Neurons2 sample3 = Glia1 - Astrocytes (CD44+) sample4 = Glia2 - Radial Glia (CD44-)
In HPC I have run steps of scrnabox (custom pipeline in progress) 1. Cell Ranger for feature seq 2. Create Seurat Objects 3. Apply minimum filtering and calculate percent mitochondria.
I have technical 3 replicates with hashtag labels at this point I haven’t yet demultiplex the hashtags. The data here will be treated as one sample. I sorted three separate samples and pooled them together.
# set up the environment
Warning message:
R graphics engine version 15 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.
library(Seurat)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
Attaching SeuratObject
Attaching sp
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
library(Matrix)
library(ggplot2)
rm(list = ls())
Read in the seurat objects made in compute canada
# this seems to never load I'll use step 3 output that has some filtering
# nFeature_RNA > 180 and percent.mt < 25
pathway <- "/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/objs/"
Neurons1 <- readRDS(paste(pathway,"seu1.rds",sep = ""))
Neurons2 <- readRDS(paste(pathway,"seu2.rds",sep = ""))
Glia1 <- readRDS(paste(pathway,"seu3.rds",sep = ""))
Glia2 <- readRDS(paste(pathway,"seu4.rds",sep = ""))
Neurons1
An object of class Seurat
33541 features across 3952 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Neurons2
An object of class Seurat
33541 features across 34830 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Glia1
An object of class Seurat
33541 features across 54723 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Glia2
An object of class Seurat
33541 features across 10338 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Have a look at the objects that already have some filtering
See the violin plots
VlnPlot(Neurons1, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
VlnPlot(Neurons1, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 500)
Warning: Removed 873 rows containing non-finite values (stat_ydensity).
Warning: Removed 873 rows containing missing values (geom_point).
VlnPlot(Neurons1, pt.size = 0.10, features = c("nCount_RNA"), y.max = 2000)
Warning: Removed 546 rows containing non-finite values (stat_ydensity).
Warning: Removed 546 rows containing missing values (geom_point).
# filter more cells
Neuron1.ft <- subset(Neurons1, subset = nFeature_RNA > 250 & nCount_RNA > 250 & nCount_RNA < 10000)
Neuron1.ft
An object of class Seurat
33541 features across 1833 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
# 33541 features across 1833 samples
Neurons 2 - CD56++
VlnPlot(Neurons2, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
VlnPlot(Neurons2, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 500)
Warning: Removed 5653 rows containing non-finite values (stat_ydensity).
Warning: Removed 5653 rows containing missing values (geom_point).
VlnPlot(Neurons2, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 1000)
Warning: Removed 2379 rows containing non-finite values (stat_ydensity).
Warning: Removed 2379 rows containing missing values (geom_point).
VlnPlot(Neurons2, pt.size = 0.10, features = c("nCount_RNA"), y.max = 2000)
Warning: Removed 2264 rows containing non-finite values (stat_ydensity).
Warning: Removed 2264 rows containing missing values (geom_point).
# filter more cells
Neuron2.ft <- subset(Neurons2, subset = nFeature_RNA > 500 & nCount_RNA > 500 & nCount_RNA < 10000)
Neuron2.ft
An object of class Seurat
33541 features across 5190 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Glia1 - Astrocyte data
VlnPlot(Glia1, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
VlnPlot(Glia1, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 5000)
Warning: Removed 82 rows containing non-finite values (stat_ydensity).
Warning: Removed 82 rows containing missing values (geom_point).
VlnPlot(Glia1, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 1000)
Warning: Removed 11811 rows containing non-finite values (stat_ydensity).
Warning: Removed 11811 rows containing missing values (geom_point).
VlnPlot(Glia1, pt.size = 0.10, features = c("nCount_RNA"), y.max = 1000)
Warning: Removed 25751 rows containing non-finite values (stat_ydensity).
Warning: Removed 25751 rows containing missing values (geom_point).
VlnPlot(Glia1, pt.size = 0.10, features = c("nCount_RNA"), y.max = 12000)
Warning: Removed 252 rows containing non-finite values (stat_ydensity).
Warning: Removed 252 rows containing missing values (geom_point).
# extreme filter
Glia1.ft <- subset(Glia1, subset = nFeature_RNA > 500 & nCount_RNA > 300 & nCount_RNA < 10000)
Glia1.ft
An object of class Seurat
33541 features across 37813 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
Glia1
An object of class Seurat
33541 features across 54723 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
VlnPlot(Glia1.ft, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
NA
NA
NA
NA
NA
Glia2 - Radial Glia
## Filter Glia 2
VlnPlot(Glia2, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
VlnPlot(Glia2, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 5000)
Warning: Removed 61 rows containing non-finite values (stat_ydensity).
Warning: Removed 61 rows containing missing values (geom_point).
VlnPlot(Glia2, pt.size = 0.10, features = c("nFeature_RNA"), y.max = 1000)
Warning: Removed 2435 rows containing non-finite values (stat_ydensity).
Warning: Removed 2435 rows containing missing values (geom_point).
VlnPlot(Glia2, pt.size = 0.10, features = c("nCount_RNA"), y.max = 1000)
Warning: Removed 3194 rows containing non-finite values (stat_ydensity).
Warning: Removed 3194 rows containing missing values (geom_point).
VlnPlot(Glia2, pt.size = 0.10, features = c("nCount_RNA"), y.max = 12000)
Warning: Removed 199 rows containing non-finite values (stat_ydensity).
Warning: Removed 199 rows containing missing values (geom_point).
# extreme filter
Glia2.ft <- subset(Glia1, subset = nFeature_RNA > 500 & nCount_RNA > 500 & nCount_RNA < 10000)
Glia2.ft
An object of class Seurat
33541 features across 37813 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
VlnPlot(Glia1.ft, pt.size = 0.10, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), ncol = 3)
# there are so many suposed cells I am concerned the high read cells are actually doublets.
Analyze each dataset - get clusters
# cluster the neurons
seu <- Neuron1.ft
seu$orig.ident <- 'Neurons1'
seu <- NormalizeData(seu, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
seu <- FindVariableFeatures(seu, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
# Identify the 10 most highly variable genes
top10 <- head(VariableFeatures(seu), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(seu)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
When using repel, set xnudge and ynudge to 0 for optimal results
seu <- ScaleData(seu)
Centering and scaling data matrix
|
| | 0%
|
|================================================ | 50%
|
|================================================================================================| 100%
seu <- RunPCA(seu)
PC_ 1
Positive: CDH19, MPZ, COL4A1, COL4A2, COL3A1, ZEB2, CTSC, OLFML2A, MIA, FN1
NRXN1, ERBB3, TGFBR2, COL14A1, COL1A2, FST, COL5A2, COL28A1, PLAT, SHC4
NTM, PMEPA1, KCTD12, CAVIN3, SOX10, LAMC1, BGN, IFI16, LIMCH1, GAS2L3
Negative: CLU, PTGDS, DLK1, SPARCL1, PTN, APOE, SAT1, TFPI2, GDF10, HPD
TMSB4X, C1orf61, MGST1, TRH, NRIP3, RBP4, WIF1, NUPR1, IGFBP7, LIX1
FGFBP1, ESM1, TPPP3, GNG11, BAALC-AS2, BAALC, LY6H, WNT2B, SFRP2, CRYAB
PC_ 2
Positive: CELF4, ANK3, CACNA2D1, PCDH9, NCKAP5, CHGB, SYT1, NEUROD1, PCLO, GNB3
OTX2, STMN2, PTPRR, INA, OCIAD2, IMPG2, DCX, DYNC1I1, SSTR2, ZFHX4
BTBD8, STMN1, GRIA2, MARCH1, SLC1A2, ATP1A3, STMN4, AMER2, BEX1, FAM19A4
Negative: CDH19, COL3A1, MPZ, CTSC, OLFML2A, COL4A2, COL4A1, MIA, VIM, FN1
TGFBR2, COL1A2, ERBB3, ZEB2, S100B, SPARC, COL14A1, S100A10, IFITM3, COL5A2
CAVIN3, PLAT, COL28A1, SOX10, FST, PLEKHA4, GAS2L3, CXCL12, ITIH5, LGALS1
PC_ 3
Positive: PCAT4, IMPG2, NEUROD1, NRXN1, CDH19, PLPPR4, TPH1, SYT1, ZEB2, MPZ
MIA, GSG1, STMN2, OLFML2A, SST, OLFM3, FAM19A4, CTSC, GNB3, PTPRR
BTBD8, COL3A1, GAS2L3, BCAT1, SOX10, COL4A2, ERBB3, CHGB, SORCS1, COL28A1
Negative: TPBG, SLC7A8, WLS, FSTL1, HES1, ANO10, PAPPA2, CDH2, MSX1, SLC2A1
ZFP36L1, PIP5K1B, NFIA, TSC22D1, SLCO1C1, SOX2, PRNP, LINC00473, SPRY1, WIF1
NOV, COLEC12, PLCG2, GDF10, SPATS2L, RRBP1, BMP7, PAG1, WFIKKN2, RFX4
PC_ 4
Positive: EOMES, MGAT4C, ELAVL3, LHX1, RASGRP1, ADCYAP1, ELAVL4, SLC16A12, CELF4, TSHZ2
PTPRO, KCNK1, SCN9A, RELN, EPS8, RAB3B, SLIT1, GRID2, ASCL1, KRT222
ZNF385D, DCLK1, BDNF, ELAVL2, RGMB, PLCXD3, UNC5D, RALYL, PPP1R14C, DNER
Negative: PCAT4, TPH1, IMPG2, BCAT1, SST, FAM19A4, BTBD8, ETV3L, GSG1, PLPPR4
IL15, GABRG2, PDE6H, OLFM3, GNB3, CLSTN2, CRABP2, RBFOX1, AC007349.2, LINC02208
AIPL1, RD3, KCNH5, NCKAP5, PRKG2, AANAT, LRRC39, ANO2, ISOC1, AP000459.1
PC_ 5
Positive: PTN, PTPRZ1, SPARCL1, MEGF10, ESM1, DLK1, GABBR2, ATP1A2, NRIP3, GDF10
NELL2, SOX2, CBLN1, APCDD1, SYTL4, SERPINI1, ARHGAP26, PTGDS, VIPR2, FTL
MARCKS, GNG11, TRH, IL17RD, EPHB1, RBP4, RSPO2, APOE, OGFRL1, AKR1C1
Negative: CYP1B1, CP, ECEL1, CXCL14, IGFBP3, WIF1, WFIKKN2, MALAT1, FHIT, EPAS1
SLC4A10, EMX2, PAPPA2, TRPM3, EFEMP1, BMP4, MGP, KCNJ13, ID1, EXPH5
KRT18, KRT8, FBLN1, MSX1, FOS, GPNMB, DCN, CDC42EP3, COL6A3, SERPINF1
Idents(seu) <- 'orig.ident'
plot <- DimPlot(seu, reduction = "pca")
plot3 <- ElbowPlot(seu,ndims = 50)
plot3
plot2
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Removed 13701 rows containing missing values (geom_point).
plot
NA
NA
NA
# umap
seu <- RunUMAP(seu, reduction = "pca", n.neighbors = 43, dims = 1:25)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
08:59:48 UMAP embedding parameters a = 0.9922 b = 1.112
08:59:48 Read 1833 rows and found 25 numeric columns
08:59:48 Using Annoy for neighbor search, n_neighbors = 43
08:59:48 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
08:59:48 Writing NN index file to temp file /var/folders/k4/khtkczkd5tn732ftjpwgtr240000gn/T//RtmpioYRC5/file1726b7ea878e1
08:59:48 Searching Annoy index using 1 thread, search_k = 4300
08:59:48 Annoy recall = 100%
08:59:49 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 43
08:59:49 Initializing from normalized Laplacian + noise (using irlba)
08:59:49 Commencing optimization for 500 epochs, with 107418 positive edges
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
08:59:53 Optimization finished
DimPlot(seu, reduction = "umap", group.by = "orig.ident")
NA
NA
NA
Make the clusters Neurons1
seu <- FindNeighbors(seu, dims = 1:25, k.param = 43)
Computing nearest neighbor graph
Computing SNN
seu <- FindClusters(seu, resolution = c(0,0.2,0.25,0.5,0.8))
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 1.0000
Number of communities: 1
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8606
Number of communities: 4
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8446
Number of communities: 5
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7787
Number of communities: 6
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7092
Number of communities: 8
Elapsed time: 0 seconds
seu <- FindClusters(seu, resolution = c(1.2))
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1833
Number of edges: 146997
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.6403
Number of communities: 10
Elapsed time: 0 seconds
library(clustree)
Loading required package: ggraph
Attaching package: ‘ggraph’
The following object is masked from ‘package:sp’:
geometry
clustree(seu, prefix = "RNA_snn_res.")
DimPlot(seu)
# look a lot at the clusers
VlnPlot(seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), group.by = 'seurat_clusters', ncol = 1)
# these cells might be grouping by - how much MT and number of Features
# clusters 4,5,6 have more features
VlnPlot(seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), group.by = 'RNA_snn_res.0.25', ncol = 1)
# here cluster 3 has higher expression, cluster 1 and 4 have similar Features RNA
# cluster 0 has higher percent MT levels
VlnPlot(seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), group.by = 'RNA_snn_res.0.5', ncol = 1)
VlnPlot(seu, features = c("nFeature_RNA", "nCount_RNA", "percent.mt"), group.by = 'RNA_snn_res.0.2', ncol = 1)
# now only cluster 0 have high MT and low features, clusters 1,2,3 have simiular RNA and Counts
Find the cluster markers for Neurons1
Idents(seu) <- 'RNA_snn_res.0.2'
ClusterMarkers <- FindAllMarkers(seu)
Calculating cluster 0
| | 0 % ~calculating
|+ | 1 % ~10s
|++ | 2 % ~09s
|++ | 3 % ~09s
|+++ | 4 % ~09s
|+++ | 5 % ~09s
|++++ | 6 % ~09s
|++++ | 7 % ~09s
|+++++ | 8 % ~08s
|+++++ | 9 % ~08s
|++++++ | 10% ~09s
|++++++ | 11% ~08s
|+++++++ | 12% ~08s
|+++++++ | 13% ~08s
|++++++++ | 14% ~08s
|++++++++ | 15% ~08s
|+++++++++ | 16% ~08s
|+++++++++ | 17% ~08s
|++++++++++ | 18% ~08s
|++++++++++ | 19% ~07s
|+++++++++++ | 20% ~07s
|+++++++++++ | 21% ~07s
|++++++++++++ | 22% ~07s
|++++++++++++ | 23% ~07s
|+++++++++++++ | 24% ~07s
|+++++++++++++ | 26% ~07s
|++++++++++++++ | 27% ~07s
|++++++++++++++ | 28% ~07s
|+++++++++++++++ | 29% ~07s
|+++++++++++++++ | 30% ~06s
|++++++++++++++++ | 31% ~06s
|++++++++++++++++ | 32% ~06s
|+++++++++++++++++ | 33% ~06s
|+++++++++++++++++ | 34% ~06s
|++++++++++++++++++ | 35% ~06s
|++++++++++++++++++ | 36% ~06s
|+++++++++++++++++++ | 37% ~06s
|+++++++++++++++++++ | 38% ~06s
|++++++++++++++++++++ | 39% ~06s
|++++++++++++++++++++ | 40% ~05s
|+++++++++++++++++++++ | 41% ~05s
|+++++++++++++++++++++ | 42% ~05s
|++++++++++++++++++++++ | 43% ~05s
|++++++++++++++++++++++ | 44% ~05s
|+++++++++++++++++++++++ | 45% ~05s
|+++++++++++++++++++++++ | 46% ~05s
|++++++++++++++++++++++++ | 47% ~05s
|++++++++++++++++++++++++ | 48% ~05s
|+++++++++++++++++++++++++ | 49% ~05s
|+++++++++++++++++++++++++ | 50% ~05s
|++++++++++++++++++++++++++ | 51% ~04s
|+++++++++++++++++++++++++++ | 52% ~04s
|+++++++++++++++++++++++++++ | 53% ~04s
|++++++++++++++++++++++++++++ | 54% ~04s
|++++++++++++++++++++++++++++ | 55% ~04s
|+++++++++++++++++++++++++++++ | 56% ~04s
|+++++++++++++++++++++++++++++ | 57% ~04s
|++++++++++++++++++++++++++++++ | 58% ~04s
|++++++++++++++++++++++++++++++ | 59% ~04s
|+++++++++++++++++++++++++++++++ | 60% ~04s
|+++++++++++++++++++++++++++++++ | 61% ~04s
|++++++++++++++++++++++++++++++++ | 62% ~03s
|++++++++++++++++++++++++++++++++ | 63% ~03s
|+++++++++++++++++++++++++++++++++ | 64% ~03s
|+++++++++++++++++++++++++++++++++ | 65% ~03s
|++++++++++++++++++++++++++++++++++ | 66% ~03s
|++++++++++++++++++++++++++++++++++ | 67% ~03s
|+++++++++++++++++++++++++++++++++++ | 68% ~03s
|+++++++++++++++++++++++++++++++++++ | 69% ~03s
|++++++++++++++++++++++++++++++++++++ | 70% ~03s
|++++++++++++++++++++++++++++++++++++ | 71% ~03s
|+++++++++++++++++++++++++++++++++++++ | 72% ~02s
|+++++++++++++++++++++++++++++++++++++ | 73% ~02s
|++++++++++++++++++++++++++++++++++++++ | 74% ~02s
|++++++++++++++++++++++++++++++++++++++ | 76% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 78% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 80% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 82% ~02s
|++++++++++++++++++++++++++++++++++++++++++ | 83% ~02s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 88% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 93% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=09s
Calculating cluster 1
| | 0 % ~calculating
|+ | 1 % ~08s
|++ | 2 % ~08s
|++ | 3 % ~08s
|+++ | 4 % ~08s
|+++ | 5 % ~07s
|++++ | 6 % ~07s
|++++ | 7 % ~07s
|+++++ | 8 % ~07s
|+++++ | 9 % ~07s
|++++++ | 10% ~07s
|++++++ | 11% ~07s
|+++++++ | 12% ~07s
|+++++++ | 13% ~07s
|++++++++ | 14% ~07s
|++++++++ | 15% ~07s
|+++++++++ | 16% ~06s
|+++++++++ | 18% ~06s
|++++++++++ | 19% ~06s
|++++++++++ | 20% ~06s
|+++++++++++ | 21% ~06s
|+++++++++++ | 22% ~06s
|++++++++++++ | 23% ~06s
|++++++++++++ | 24% ~06s
|+++++++++++++ | 25% ~06s
|+++++++++++++ | 26% ~06s
|++++++++++++++ | 27% ~06s
|++++++++++++++ | 28% ~06s
|+++++++++++++++ | 29% ~05s
|+++++++++++++++ | 30% ~05s
|++++++++++++++++ | 31% ~05s
|++++++++++++++++ | 32% ~05s
|+++++++++++++++++ | 33% ~05s
|++++++++++++++++++ | 34% ~05s
|++++++++++++++++++ | 35% ~05s
|+++++++++++++++++++ | 36% ~05s
|+++++++++++++++++++ | 37% ~05s
|++++++++++++++++++++ | 38% ~05s
|++++++++++++++++++++ | 39% ~05s
|+++++++++++++++++++++ | 40% ~05s
|+++++++++++++++++++++ | 41% ~05s
|++++++++++++++++++++++ | 42% ~04s
|++++++++++++++++++++++ | 43% ~04s
|+++++++++++++++++++++++ | 44% ~04s
|+++++++++++++++++++++++ | 45% ~04s
|++++++++++++++++++++++++ | 46% ~04s
|++++++++++++++++++++++++ | 47% ~04s
|+++++++++++++++++++++++++ | 48% ~04s
|+++++++++++++++++++++++++ | 49% ~04s
|++++++++++++++++++++++++++ | 51% ~04s
|++++++++++++++++++++++++++ | 52% ~04s
|+++++++++++++++++++++++++++ | 53% ~04s
|+++++++++++++++++++++++++++ | 54% ~04s
|++++++++++++++++++++++++++++ | 55% ~04s
|++++++++++++++++++++++++++++ | 56% ~03s
|+++++++++++++++++++++++++++++ | 57% ~03s
|+++++++++++++++++++++++++++++ | 58% ~03s
|++++++++++++++++++++++++++++++ | 59% ~03s
|++++++++++++++++++++++++++++++ | 60% ~03s
|+++++++++++++++++++++++++++++++ | 61% ~03s
|+++++++++++++++++++++++++++++++ | 62% ~03s
|++++++++++++++++++++++++++++++++ | 63% ~03s
|++++++++++++++++++++++++++++++++ | 64% ~03s
|+++++++++++++++++++++++++++++++++ | 65% ~03s
|+++++++++++++++++++++++++++++++++ | 66% ~03s
|++++++++++++++++++++++++++++++++++ | 67% ~03s
|+++++++++++++++++++++++++++++++++++ | 68% ~02s
|+++++++++++++++++++++++++++++++++++ | 69% ~02s
|++++++++++++++++++++++++++++++++++++ | 70% ~02s
|++++++++++++++++++++++++++++++++++++ | 71% ~02s
|+++++++++++++++++++++++++++++++++++++ | 72% ~02s
|+++++++++++++++++++++++++++++++++++++ | 73% ~02s
|++++++++++++++++++++++++++++++++++++++ | 74% ~02s
|++++++++++++++++++++++++++++++++++++++ | 75% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 76% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 78% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 80% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~01s
|++++++++++++++++++++++++++++++++++++++++++ | 82% ~01s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 88% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 93% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=08s
Calculating cluster 2
| | 0 % ~calculating
|+ | 1 % ~06s
|++ | 2 % ~06s
|++ | 3 % ~05s
|+++ | 4 % ~05s
|+++ | 5 % ~05s
|++++ | 6 % ~05s
|++++ | 8 % ~05s
|+++++ | 9 % ~05s
|+++++ | 10% ~05s
|++++++ | 11% ~05s
|++++++ | 12% ~05s
|+++++++ | 13% ~05s
|+++++++ | 14% ~05s
|++++++++ | 15% ~05s
|+++++++++ | 16% ~05s
|+++++++++ | 17% ~04s
|++++++++++ | 18% ~04s
|++++++++++ | 19% ~04s
|+++++++++++ | 20% ~04s
|+++++++++++ | 22% ~04s
|++++++++++++ | 23% ~04s
|++++++++++++ | 24% ~04s
|+++++++++++++ | 25% ~04s
|+++++++++++++ | 26% ~04s
|++++++++++++++ | 27% ~04s
|++++++++++++++ | 28% ~04s
|+++++++++++++++ | 29% ~04s
|++++++++++++++++ | 30% ~04s
|++++++++++++++++ | 31% ~04s
|+++++++++++++++++ | 32% ~04s
|+++++++++++++++++ | 33% ~04s
|++++++++++++++++++ | 34% ~04s
|++++++++++++++++++ | 35% ~04s
|+++++++++++++++++++ | 37% ~03s
|+++++++++++++++++++ | 38% ~03s
|++++++++++++++++++++ | 39% ~03s
|++++++++++++++++++++ | 40% ~03s
|+++++++++++++++++++++ | 41% ~03s
|+++++++++++++++++++++ | 42% ~03s
|++++++++++++++++++++++ | 43% ~03s
|+++++++++++++++++++++++ | 44% ~03s
|+++++++++++++++++++++++ | 45% ~03s
|++++++++++++++++++++++++ | 46% ~03s
|++++++++++++++++++++++++ | 47% ~03s
|+++++++++++++++++++++++++ | 48% ~03s
|+++++++++++++++++++++++++ | 49% ~03s
|++++++++++++++++++++++++++ | 51% ~03s
|++++++++++++++++++++++++++ | 52% ~03s
|+++++++++++++++++++++++++++ | 53% ~03s
|+++++++++++++++++++++++++++ | 54% ~03s
|++++++++++++++++++++++++++++ | 55% ~02s
|++++++++++++++++++++++++++++ | 56% ~02s
|+++++++++++++++++++++++++++++ | 57% ~02s
|++++++++++++++++++++++++++++++ | 58% ~02s
|++++++++++++++++++++++++++++++ | 59% ~02s
|+++++++++++++++++++++++++++++++ | 60% ~02s
|+++++++++++++++++++++++++++++++ | 61% ~02s
|++++++++++++++++++++++++++++++++ | 62% ~02s
|++++++++++++++++++++++++++++++++ | 63% ~02s
|+++++++++++++++++++++++++++++++++ | 65% ~02s
|+++++++++++++++++++++++++++++++++ | 66% ~02s
|++++++++++++++++++++++++++++++++++ | 67% ~02s
|++++++++++++++++++++++++++++++++++ | 68% ~02s
|+++++++++++++++++++++++++++++++++++ | 69% ~02s
|+++++++++++++++++++++++++++++++++++ | 70% ~02s
|++++++++++++++++++++++++++++++++++++ | 71% ~02s
|+++++++++++++++++++++++++++++++++++++ | 72% ~02s
|+++++++++++++++++++++++++++++++++++++ | 73% ~01s
|++++++++++++++++++++++++++++++++++++++ | 74% ~01s
|++++++++++++++++++++++++++++++++++++++ | 75% ~01s
|+++++++++++++++++++++++++++++++++++++++ | 76% ~01s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~01s
|++++++++++++++++++++++++++++++++++++++++ | 78% ~01s
|++++++++++++++++++++++++++++++++++++++++ | 80% ~01s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~01s
|+++++++++++++++++++++++++++++++++++++++++ | 82% ~01s
|++++++++++++++++++++++++++++++++++++++++++ | 83% ~01s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 86% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 88% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 90% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06s
Calculating cluster 3
| | 0 % ~calculating
|+ | 1 % ~09s
|++ | 2 % ~09s
|++ | 3 % ~09s
|+++ | 4 % ~09s
|+++ | 5 % ~09s
|++++ | 6 % ~09s
|++++ | 7 % ~08s
|+++++ | 8 % ~08s
|+++++ | 9 % ~09s
|++++++ | 10% ~09s
|++++++ | 11% ~08s
|+++++++ | 12% ~08s
|+++++++ | 13% ~08s
|++++++++ | 14% ~08s
|++++++++ | 15% ~08s
|+++++++++ | 16% ~08s
|+++++++++ | 18% ~08s
|++++++++++ | 19% ~07s
|++++++++++ | 20% ~07s
|+++++++++++ | 21% ~07s
|+++++++++++ | 22% ~07s
|++++++++++++ | 23% ~07s
|++++++++++++ | 24% ~07s
|+++++++++++++ | 25% ~07s
|+++++++++++++ | 26% ~07s
|++++++++++++++ | 27% ~07s
|++++++++++++++ | 28% ~06s
|+++++++++++++++ | 29% ~06s
|+++++++++++++++ | 30% ~06s
|++++++++++++++++ | 31% ~06s
|++++++++++++++++ | 32% ~06s
|+++++++++++++++++ | 33% ~06s
|++++++++++++++++++ | 34% ~06s
|++++++++++++++++++ | 35% ~06s
|+++++++++++++++++++ | 36% ~06s
|+++++++++++++++++++ | 37% ~06s
|++++++++++++++++++++ | 38% ~06s
|++++++++++++++++++++ | 39% ~05s
|+++++++++++++++++++++ | 40% ~05s
|+++++++++++++++++++++ | 41% ~05s
|++++++++++++++++++++++ | 42% ~05s
|++++++++++++++++++++++ | 43% ~05s
|+++++++++++++++++++++++ | 44% ~05s
|+++++++++++++++++++++++ | 45% ~05s
|++++++++++++++++++++++++ | 46% ~05s
|++++++++++++++++++++++++ | 47% ~05s
|+++++++++++++++++++++++++ | 48% ~05s
|+++++++++++++++++++++++++ | 49% ~04s
|++++++++++++++++++++++++++ | 51% ~04s
|++++++++++++++++++++++++++ | 52% ~04s
|+++++++++++++++++++++++++++ | 53% ~04s
|+++++++++++++++++++++++++++ | 54% ~04s
|++++++++++++++++++++++++++++ | 55% ~04s
|++++++++++++++++++++++++++++ | 56% ~04s
|+++++++++++++++++++++++++++++ | 57% ~04s
|+++++++++++++++++++++++++++++ | 58% ~04s
|++++++++++++++++++++++++++++++ | 59% ~04s
|++++++++++++++++++++++++++++++ | 60% ~04s
|+++++++++++++++++++++++++++++++ | 61% ~03s
|+++++++++++++++++++++++++++++++ | 62% ~03s
|++++++++++++++++++++++++++++++++ | 63% ~03s
|++++++++++++++++++++++++++++++++ | 64% ~03s
|+++++++++++++++++++++++++++++++++ | 65% ~03s
|+++++++++++++++++++++++++++++++++ | 66% ~03s
|++++++++++++++++++++++++++++++++++ | 67% ~03s
|+++++++++++++++++++++++++++++++++++ | 68% ~03s
|+++++++++++++++++++++++++++++++++++ | 69% ~03s
|++++++++++++++++++++++++++++++++++++ | 70% ~03s
|++++++++++++++++++++++++++++++++++++ | 71% ~03s
|+++++++++++++++++++++++++++++++++++++ | 72% ~02s
|+++++++++++++++++++++++++++++++++++++ | 73% ~02s
|++++++++++++++++++++++++++++++++++++++ | 74% ~02s
|++++++++++++++++++++++++++++++++++++++ | 75% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 76% ~02s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 78% ~02s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 80% ~02s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~02s
|++++++++++++++++++++++++++++++++++++++++++ | 82% ~02s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~01s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~01s
|++++++++++++++++++++++++++++++++++++++++++++ | 88% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 93% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=09s
top5 <- ClusterMarkers %>% group_by(cluster) %>% top_n(n=5, wt = avg_log2FC)
DoHeatmap(seu, features = top5$gene, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.2')
#write.csv(ClusterMarkers, "/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/ClusterMarkers_neurons1_res025.csv")
write.csv(ClusterMarkers, "/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/ClusterMarkers_neurons1_res02.csv")
DimPlot(seu, group.by = 'RNA_snn_res.0.2', reduction = 'umap')
Get the most highly expressed genes in the total data (Neurons1)
Filters out specific genes
seu.ft <- seu[!grepl("MALAT1", rownames(seu)), ]
seu.ft <- seu.ft[!grepl("^MT-", rownames(seu.ft)), ]
Try to find doublets with doublet finder
remotes::install_github('chris-mcginnis-ucsf/DoubletFinder')
Skipping install of 'DoubletFinder' from a github remote, the SHA1 (67fb8b58) has not changed since last install.
Use `force = TRUE` to force installation
suppressMessages(require(DoubletFinder))
Do the double cells have more genes than the singlet??
VlnPlot(seu.d, features = "nFeature_RNA", group.by = DF.name, pt.size = 0.1)
NA
NA
dim(seu.d)
[1] 33538 1723
dim(seu)
[1] 33538 1833
Save the filtered, doublet removed Neurons object Re-run PCA for clustering
saveRDS(seu.d, "/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/objs/NeuronsFilteredSeu28092022.RDS")
DAsubgroups data has not be re-processed Run standard workflow chunk
seu <- AIW60
seu <- NormalizeData(seu, normalization.method = "LogNormalize", scale.factor = 10000)
Performing log-normalization
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
seu <- FindVariableFeatures(seu, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
seu <- ScaleData(seu)
Centering and scaling data matrix
|
| | 0%
|
|================================================ | 50%
|
|================================================================================================| 100%
seu <- RunPCA(seu)
PC_ 1
Positive: STMN2, DCX, INA, MAP2, SOX4, KIF5C, NCAM1, GAP43, SOX11, NSG2
ANK3, GPM6A, SYT1, TUBB2A, NRXN1, PKIA, NR2F1, STMN4, UCHL1, RTN1
TAGLN3, RUNX1T1, DPYSL3, NSG1, NEFM, PGM2L1, PRKAR2B, PBX1, POU2F2, ELAVL4
Negative: SPARC, ZFP36L1, VIM, TPBG, MDK, ANXA5, GNG5, CD9, CA2, CAST
ANXA2, HES1, FSTL1, NFIA, CD99, IGFBP2, TTR, IGFBP7, GSTP1, ID1
ID3, CYSTM1, PLTP, ZFP36L2, TRPM3, FAT1, COLEC12, B2M, SPARCL1, TMEM123
PC_ 2
Positive: TTR, TPBG, IGFBP7, ANXA2, TRPM3, SLC7A8, CD9, SPINT2, CHCHD2, NFIA
SPARCL1, COLEC12, PPIC, PRNP, WFIKKN2, PIFO, BMP4, DMKN, LINC01088, ID1
MITF, CA2, ECEL1, SLC5A3, SERPINF1, CFAP126, PCP4, KRT18, SELENOP, CPVL
Negative: NUSAP1, TOP2A, MKI67, CENPF, PBK, CDK1, TPX2, NUF2, UBE2C, BIRC5
MAD2L1, CCNA2, ASPM, NCAPG, SPC25, PCLAF, CENPU, PIMREG, NDC80, KNL1
SMC4, KIF15, DLGAP5, SGO1, CDCA2, CDCA8, MIS18BP1, RRM2, CENPE, KIF11
PC_ 3
Positive: TTYH1, PTPRZ1, NES, QKI, VIM, BOC, FGFBP3, HES5, SOX2, SLC1A3
RFX4, FAM181B, IGDCC3, FAM181A, EDNRB, LINC00461, PON2, RPS27L, VCAM1, CCND1
ARHGEF6, ZFP36L1, PLP1, JAG1, PCDH18, ITGB8, SMOC1, DLK1, TMEM38B, TFDP2
Negative: RTN1, STMN2, NSG2, GAP43, INA, PRKAR2B, MAPT, UCHL1, NRXN1, PKIA
TRPM3, IGFBP7, NSG1, C11orf88, NCAM1, SHTN1, DLGAP5, PCP4, SOBP, DCX
CFAP126, ANK3, TTR, NEK2, HIST1H4C, GPM6A, XPR1, CELF4, SPINT2, MITF
PC_ 4
Positive: C11orf88, CAPSL, C1orf194, FAM81B, AKAP14, FAM183A, CFAP126, C9orf24, RSPH1, TCTEX1D1
ROPN1L, PIFO, C5orf49, CFAP52, DAW1, ARMC3, CCDC170, FAM216B, EFCAB1, MAP3K19
CP, SPAG6, CFAP45, AL357093.2, TEKT1, ANKRD66, MORN5, PTPRC, DYNLRB2, CFAP299
Negative: CNTNAP2, SYT4, ZIC1, APP, ZIC2, CBLN1, TNC, APCDD1, PTN, EPHA7
PAX6, SLITRK6, DSP, SPARCL1, WLS, ZFHX4, RSPO2, ATP1A2, CDK6, TXNIP
IL17RD, TPBG, AMER2, GDF10, ZIC4, PCDH9, WNT2B, HES1, GAP43, ITGA6
PC_ 5
Positive: CALM1, TMSB4X, C11orf88, CKB, ACAT2, C1orf194, FGFBP3, C5orf49, HMGCS1, NR2F1
SCD, PTPRZ1, AKAP14, CAPSL, MSMO1, CFAP126, RPS2, IDI1, FAM81B, FDPS
FAM183A, PEG10, RSPH1, FDFT1, TUBA1B, GPM6B, ROPN1L, ARL4A, QKI, NTRK2
Negative: CCNB1, PLK1, UBE2C, BUB1, CDC20, ASPM, KIF20A, CENPA, CDCA8, DLGAP5
AURKA, KIF2C, CCNB2, NEK2, FAM83D, TTK, NUF2, PIF1, TPX2, KIF14
GTSE1, CDCA3, CDKN3, PIMREG, HMMR, CENPE, CDCA2, DEPDC1, CKAP2L, SGO2
seu <- RunUMAP(seu, reduction = "pca", n.neighbors = 123, dims = 1:30)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
13:03:03 UMAP embedding parameters a = 0.9922 b = 1.112
13:03:03 Read 15339 rows and found 30 numeric columns
13:03:03 Using Annoy for neighbor search, n_neighbors = 123
13:03:03 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
13:03:04 Writing NN index file to temp file /var/folders/k4/khtkczkd5tn732ftjpwgtr240000gn/T//RtmphW80ig/file17bc7524a9002
13:03:04 Searching Annoy index using 1 thread, search_k = 12300
13:03:14 Annoy recall = 100%
13:03:14 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 123
13:03:16 Initializing from normalized Laplacian + noise (using irlba)
13:03:17 Commencing optimization for 200 epochs, with 2087112 positive edges
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
13:03:32 Optimization finished
DimPlot(seu, reduction = "umap")
Look at the DA data from Kamath
Annotate clusters Use: Organoid data, public brain data (LaManno, Lake, Mascako)
top10 <- head(VariableFeatures(DAsubtypes.sub), 10)
# plot variable features with and without labels
plot1 <- VariableFeaturePlot(DAsubtypes.sub)
plot2 <- LabelPoints(plot = plot1, points = top10, repel = TRUE)
When using repel, set xnudge and ynudge to 0 for optimal results
plot2
Warning: Transformation introduced infinite values in continuous x-axis
Warning: Removed 9387 rows containing missing values (geom_point).
See how each looks on UMAP
DimPlot(seu.q, group.by = 'RNA_snn_res.1.2')
DimPlot(seu.q, group.by = 'RNA_snn_res.0.2')
DimPlot(seu.q, group.by = 'AIW60.pred')
DimPlot(seu.q, group.by = 'MBOAIW.pred')
DimPlot(seu.q, group.by = 'MBOAST23.pred')
NA
NA
FindClusters(seu.q, resolution = c(0, 0.2,0.4,0.6))
Error in FindClusters.Seurat(seu.q, resolution = c(0, 0.2, 0.4, 0.6)) :
Provided graph.name not present in Seurat object
NA
Redo find clusters
seu.q <- FindNeighbors(seu.q, dims = 1:25, k.param = 43)
Computing nearest neighbor graph
Computing SNN
seu.q <- FindClusters(seu.q, resolution = c(0,0.2,0.4,0.6))
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1723
Number of edges: 145289
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 1.0000
Number of communities: 1
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1723
Number of edges: 145289
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8573
Number of communities: 4
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1723
Number of edges: 145289
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7887
Number of communities: 6
Elapsed time: 0 seconds
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1723
Number of edges: 145289
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7394
Number of communities: 7
Elapsed time: 0 seconds
seu.q <- FindClusters(seu.q, resolution = c(1.2))
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 1723
Number of edges: 145289
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.6259
Number of communities: 10
Elapsed time: 0 seconds
library(clustree)
Loading required package: ggraph
Attaching package: ‘ggraph’
The following object is masked from ‘package:sp’:
geometry
clustree(seu.q, prefix = "RNA_snn_res.")
DimPlot(seu.q)
Look at the predictions in the new clusters
Based on the 3 different predictions I can lable the cell types
0 - NPC or early neurons 1 - immature excitatory neurons 2 - NPC or early neurons 3 - RG or Oligos 4- Dopaminergic neurons - possibly early 5 - NPC or early neurons 6 - Radial GLia
I will also find markers and look at a list of neuronal markers
top5 <- ClusterMarkers %>% group_by(cluster) %>% top_n(n=5, wt = avg_log2FC)
DoHeatmap(seu.q, features = top5$gene, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = top5$gene, size = 3, angle = 90, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: SAT1, MTRNR2L12, MT-ND2
write.csv(ClusterMarkers,"/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/Neurons1ClusterMarkers7.csv")
Explore some Gene expression levels
feature_list = c("MKI67","SOX2","POU5F1","DLX2","PAX6","SOX9","HES1","NES","RBFOX3","MAP2","NCAM1","CD24","GRIA2","GRIN2B","GABBR1","GAD1","GAD2","GABRA1","GABRB2","TH","ALDH1A1","LMX1B","NR4A2","CORIN","CALB1","KCNJ6","CXCR4","ITGA6","SLC1A3","CD44","AQP4","S100B", "PDGFRA","OLIG2","MBP","CLDN11","VIM","VCAM1")
DoHeatmap(seu.q, features = feature_list, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = feature_list, size = 3, angle = 90, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: VCAM1, CLDN11, OLIG2, PDGFRA, CD44, SLC1A3, KCNJ6, CORIN, NR4A2, LMX1B, ALDH1A1, GABRA1, GAD2, GABBR1, NCAM1, RBFOX3, NES, SOX9, DLX2, POU5F1, MKI67
DotPlot(seu.q, features = feature_list) +RotatedAxis()
PD_poulin = c("TH","SLC6A3","SLC18A2","SOX6","NDNF","SNCG","ALDH1A1","CALB1","TACR2","SLC17A6","SLC32A1","OTX2","GRP","LPL","CCK","VIP")
DoHeatmap(seu.q, features = PD_poulin, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = PD_poulin, size = 3, angle = 90, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: VIP, CCK, GRP, SLC32A1, TACR2, ALDH1A1, SNCG, NDNF, SOX6, SLC18A2, SLC6A3
DotPlot(seu.q, features = PD_poulin)+RotatedAxis()
ealryNeur = c("DCX","NEUROD1","TBR1")
proliferation = c("PCNA","MKI67")
neuralstem = c("SOX2","NES","PAX6","MASH1")
feature_list <- c("DCX","NEUROD1","TBR1","PCNA","MKI67","SOX2","NES","PAX6","MASH1")
DoHeatmap(seu.q, features = feature_list, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = feature_list, size = 3, angle = 90, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: MASH1, NES, MKI67, PCNA
DotPlot(seu.q, features = feature_list)+RotatedAxis()
Warning in FetchData.Seurat(object = object, vars = features, cells = cells) :
The following requested variables were not found: MASH1
# no proliferation marker expression PCNA or MKI67
# cluster 4 DA neurons - shows early neuron marker and low PAX 4
# cluster 3 has higher SOX2 - neuroblast marker / NPC marker
mat_neuron = c("RBFOX3","SYP","DLG45","VAMP1","VAMP2","TUBB3","SYT1","BSN","HOMER1","SLC17A6")
# NeuN is FOX3 - RBFOX3
# PSD95 also SP-90 or DLG4
# VGLUT2 is SLC17A6
DoHeatmap(seu.q, features = mat_neuron, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = mat_neuron, size = 3, angle = 90, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: HOMER1, BSN, TUBB3, VAMP1, DLG45, SYP, RBFOX3
# cluster 4 also show mature neuron markers
DotPlot(seu.q, features = mat_neuron)+RotatedAxis()
Warning in FetchData.Seurat(object = object, vars = features, cells = cells) :
The following requested variables were not found: DLG45
# excitatory neuron markers
ex = c("GRIA2","GRIA1","GRIA4","GRIN1","GRIN2B","GRIN2A","GRIN3A","GRIN3","GRIP1","CAMK2A")
DoHeatmap(seu.q, features = ex, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = ex, size = 3, angle = 90, group.bar.height = 0.02, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: CAMK2A, GRIP1, GRIN3, GRIN3A, GRIN2A, GRIN1, GRIA4
DotPlot(seu.q, features = ex)+RotatedAxis()
Warning in FetchData.Seurat(object = object, vars = features, cells = cells) :
The following requested variables were not found: GRIN3
# inhibitory neuron markers
inh = c("GAD1","GAD2", "GAT1","PVALB","GABR2","GABR1","GBRR1","GABRB2","GABRB1","GABRB3","GABRA6","GABRA1","GABRA4","TRAK2")
DoHeatmap(seu.q, features = inh, size=3, angle =90, group.bar.height = 0.02, group.by = 'RNA_snn_res.0.6')
Warning in DoHeatmap(seu.q, features = inh, size = 3, angle = 90, group.bar.height = 0.02, :
The following features were omitted as they were not found in the scale.data slot for the RNA assay: TRAK2, GABRA1, GABRA6, GABRB3, GABRB1, GBRR1, GABR1, GABR2, PVALB, GAT1, GAD2
DotPlot(seu.q, features = inh)+RotatedAxis()
Warning in FetchData.Seurat(object = object, vars = features, cells = cells) :
The following requested variables were not found: GAT1, GABR2, GABR1, GBRR1
# cluster 4 is more excitatory than inhbitory but neither marker set has much expression
Checkout the Enricher cell type libraries from
N1.c6 <- ClusterMarkers %>% filter(cluster == 6 & avg_log2FC > 0)
genes <- N1.c6$gene
N1.c6.Er <- enrichr(genes, databases = db)
Uploading data to Enrichr... Done.
Querying Allen_Brain_Atlas_up... Done.
Querying Descartes_Cell_Types_and_Tissue_2021... Done.
Querying CellMarker_Augmented_2021... Done.
Querying Azimuth_Cell_Types_2021... Done.
Parsing results... Done.
plotEnrich(N1.c6.Er[[1]], showTerms = 20, numChar = 40, y = "Count", orderBy = "P.value")
plotEnrich(N1.c6.Er[[2]], showTerms = 20, numChar = 40, y = "Count", orderBy = "P.value")
plotEnrich(N1.c6.Er[[3]], showTerms = 20, numChar = 40, y = "Count", orderBy = "P.value")
plotEnrich(N1.c6.Er[[4]], showTerms = 20, numChar = 40, y = "Count", orderBy = "P.value")
N1.Er.genes.1 <- N1.c6.Er[[1]] %>% select(Term, Genes, Combined.Score)
N1.Er.genes.1
N1.Er.genes.2 <- N1.c6.Er[[2]] %>% select(Term, Genes, Combined.Score)
N1.Er.genes.2
N1.Er.genes.3 <- N1.c6.Er[[3]] %>% select(Term, Genes, Combined.Score)
N1.Er.genes.3
N1.Er.genes.4 <- N1.c6.Er[[4]] %>% select(Term, Genes, Combined.Score)
N1.Er.genes.4
NA
Library of tissue cell types for up regulated genes per cluster 0 - hypothalmus, DA A13 1- neural plate, Radial Glia 2 - Neural stem 3 - stromal, astro OPC 4 - Neurons 5 - endothelial, pericyte 6 - maybe neurons maybe not
By the combined information - annotate the clusters in Neurons1
cluster.ids <- c("EarlyNeurons","Neurons","NPC","OPC-RG","DAneurons",
"Other","RG")
unique(seu.q$RNA_snn_res.0.6)
[1] 1 5 0 3 4 2 6
Levels: 0 1 2 3 4 5 6
names(cluster.ids) <- levels(seu.q)
seu.q <- RenameIdents(seu.q, cluster.ids)
seu.q$subgroups <- Idents(seu.q)
DimPlot(seu.q, reduction = "umap", label = TRUE, group.by = 'subgroups', repel = TRUE)
There is a problem with the data transfer in DA Subgroups.
Have to debug later
anchors <- FindTransferAnchors(reference = DAsubtypes.sub, query = seu.q, dims = 1:25)
Performing PCA on the provided reference using 1578 features as input.
Projecting cell embeddings
Finding neighborhoods
Finding anchors
Found 149 anchors
Filtering anchors
Retained 74 anchors
print("getting predictions")
[1] "getting predictions"
predictions <- TransferData(anchorset = anchors, refdata = DAsubtypes.sub$Cell_Subtype) ## error
Finding integration vectors
Finding integration vector weights
Error in idx[i, ] <- res[[i]][[1]] :
number of items to replace is not a multiple of replacement length
Merge the files
Idents(Neurons1ft) <- 'orig.ident'
Neurons1ft$orig.ident <- 'Neurons1'
Idents(Neurons2ft) <- 'orig.ident'
Neurons2ft$orig.ident <- 'Neurons2'
Idents(Glia1ft) <- 'orig.ident'
Glia1ft$orig.ident <- 'Glia1'
Idents(Glia2ft) <- 'orig.ident'
Glia2ft$orig.ident <- 'Glia2'
sorted.merge <- merge(Neurons1ft, y=c(Neurons2ft,Glia1ft,Glia2ft), add.cell.ids = c("Neurons1","Neurons2","Glia1","Glia2"), project = "SortedCells")
sorted.merge
An object of class Seurat
33544 features across 138206 samples within 2 assays
Active assay: RNA (33538 features, 0 variable features)
1 other assay present: HTO
table(all.ft$orig.ident)
Glia1 Glia2 Neurons1 Neurons2
55268 10429 3418 29575
Prepare for the clustering
all.ft <- ScaleData(all.ft)
Centering and scaling data matrix
|
| | 0%
|
|============================================== | 50%
|
|=============================================================================================| 100%
all.ft <- RunPCA(all.ft)
PC_ 1
Positive: STMN2, CD24, CHGB, CELF4, GRIA2, CACNA2D1, EOMES, INSM1, PCAT4, NEUROD1
MARCH1, GAP43, RASGRP1, MIAT, STMN4, PCSK2, IMPG2, CHGA, INA, MGAT4C
CDHR1, NRN1, GNG8, ST18, IGFBPL1, DDC, LHX1, SLC17A6, NEUROG1, PKIB
Negative: VIM, COL3A1, LGALS1, IGFBP7, DCN, APOE, COL1A1, SPARC, IGFBP4, MDK
TIMP1, GSTP1, HSP90B1, S100A11, COL1A2, IGFBP5, TMSB10, GAPDH, S100A6, SPATS2L
RPL7A, CALD1, ZFP36L1, IGFBP2, RPL10, FTL, SPARCL1, FABP5, S100A10, IFITM3
PC_ 2
Positive: COL3A1, DCN, COL1A1, IGFBP7, COL1A2, APOE, VIM, HPD, IGFBP4, CP
FABP5, FTL, S100A11, S100A6, VCAN, NDUFA4L2, LGALS1, WFIKKN2, COL6A3, MGP
TIMP1, IFITM3, LUM, CYP1B1, RPS12, CRYAB, ID3, CXCL14, WIF1, APOC1
Negative: CELF4, CACNA2D1, INA, STMN2, CHGB, PCDH9, ANK3, MARCH1, EOMES, GRIA2
SOX4, DCX, SSTR2, STMN1, NEUROD1, TENM3, INSM1, ATCAY, OLFM1, STMN4
CRMP1, SYT1, GAP43, ATP1A3, NCKAP5, AMER2, PCLO, ELAVL3, CA10, NRXN1
PC_ 3
Positive: COL3A1, COL1A2, COL1A1, VCAN, COL6A3, S100A4, LUM, DCN, COL4A1, COL4A2
OGN, EDNRA, COL6A1, COL5A2, APOD, PRRX1, COL6A2, IGFBP4, FN1, MFAP4
NDUFA4L2, THY1, TNFAIP6, PCOLCE, COL15A1, COL5A1, IFI16, PLAC9, CD248, BGN
Negative: PTPRZ1, WLS, SOX2, GABBR2, CNTNAP2, SPARCL1, LY6H, CLU, GDF10, CRYAB
SERPINI1, S100B, WNT2B, ESM1, SLC2A1, ITM2C, LIX1, TPBG, RFX4, CBLN1
RSPO2, MEGF10, NELL2, MGST1, SCG2, HPD, VAT1L, ABCA8, PLTP, SLC1A3
PC_ 4
Positive: WFIKKN2, CP, ECEL1, SERPINF1, KRT18, ID1, PCP4, CYP1B1, RBP1, IGFBP7
TPBG, KRT8, FHIT, WIF1, CA2, TMSB4X, BMP4, HPD, FOLR1, CRABP2
SPINT2, EFEMP1, CXCL14, KCNJ13, ASCL1, GPNMB, MDK, EMX2, SLC4A10, RPS12
Negative: PTN, PTPRZ1, GABBR2, MEGF10, DLK1, SCG2, ATP1A2, FN1, SGCD, SOX2
ABCA8, NTRK2, RFX4, ESM1, COL4A1, MARCKS, COL4A2, VCAN, VAT1L, GRIN2A
NRIP3, SCRG1, HAP1, OGN, SORCS2, EDNRB, SFRP2, PLP1, LUM, FFAR4
PC_ 5
Positive: TPBG, FN1, OGN, DKK3, IFI16, MSX1, CXCL14, OLFML2A, CA2, PAPPA2
NID1, SYNE2, ENPP2, SLC4A10, CDC42EP3, KCNJ13, IGF1, GPHN, IGFBP2, ZEB2
EXPH5, NOV, TRPM3, HTR2C, GPNMB, MPZ, CYP1B1, CADM2, SLC5A3, PIK3R1
Negative: APOE, APOC1, DCN, FABP5, DLK1, PTN, KDR, PRSS56, CLU, IGFBP4
BST2, TFPI2, S100A6, PTGDS, CITED4, COL23A1, RPL10, FTL, SLC7A2, IFITM3
FABP4, NDUFA4L2, GNG11, RARRES1, FXYD5, SCG2, WFDC1, TMEM176B, RPL7A, TMEM176A
plot <- VizDimLoadings(all.ft, dims = 1:2, reduction = "pca")
#SaveFigure(plot, "viz_PCA_loadings", width = 10, height = 8)
plot
NA
NA
plot <- DimPlot(all.ft, reduction = "pca", group.by = "orig.ident")
plot
plot <- ElbowPlot(all.ft,ndims = 50)
plot
# best is the
VlnPlot(all.ft, features = c("NCAM1","CD44","CD24","AQP4","TH"), ncol = 3, group.by = 'orig.ident')
The expression of the selected markers are not very revealing - there are neuronal markers in the Glia
I wish I had done an unsorted sample
Now I have all 4 samples I want to try the integration steps
Warning message:
R graphics engine version 15 is not supported by this version of RStudio. The Plots tab will be disabled until a newer version of RStudio is installed.
all.ft <- readRDS("/Users/rhalenathomas/Documents/Data/scRNAseq/PhenoID/scRNAseqSorted/cellrangerOuts/AllcellsObjectSept20filtered.Rds")
Registered S3 method overwritten by 'data.table':
method from
print.data.table
Registered S3 method overwritten by 'htmlwidgets':
method from
print.htmlwidget tools:rstudio
all.int <- IntegrateData(anchorset = all.anchors)
Merging dataset 1 into 2
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 4 into 3
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 2 1 into 3 4
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Now that I have the integrated object I will continue to follow the standard scRNAseq to workflow and identify cell types.
Cluster
all.int <- FindNeighbors(all.int, dims = 1:30, k.param = 275)
Computing nearest neighbor graph
Computing SNN
Try on the not integrated object - just merge
all.ft <- RunUMAP(all.ft, reduction = "pca", n.neighbors = 30, dims = 1:21)
16:34:29 UMAP embedding parameters a = 0.9922 b = 1.112
16:34:29 Read 92644 rows and found 21 numeric columns
16:34:29 Using Annoy for neighbor search, n_neighbors = 30
16:34:29 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:34:35 Writing NN index file to temp file /var/folders/k4/khtkczkd5tn732ftjpwgtr240000gn/T//RtmpxdKhUe/file114bc936cbd
16:34:35 Searching Annoy index using 1 thread, search_k = 3000
16:35:03 Annoy recall = 100%
16:35:04 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30
16:35:08 Initializing from normalized Laplacian + noise (using irlba)
16:35:16 Commencing optimization for 200 epochs, with 4273678 positive edges
Using method 'umap'
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:36:16 Optimization finished
DimPlot(all.ft, reduction = "umap", group.by = "orig.ident")
Find the clusters
all.ft <- FindClusters(all.ft, resolution = 0.3)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 92644
Number of edges: 2838478
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.8755
Number of communities: 9
Elapsed time: 34 seconds
#clustree(all.int, prefix = "RNA_snn_res.")
DimPlot(all.ft, reduction = "umap")
Transfer labels from other Midbrain organoid data (165 days)
# this is the reference data
MBO <- readRDS("/Users/rhalenathomas/Documents/Data/scRNAseq/AST23_BrainComm/MBOclusters_names29072021.rds")
Idents(MBO) <- "cluster_labels"
DefaultAssay(all.ft) <- "RNA"
DefaultAssay(MBO) <- "RNA"
MO.query <- all.ft
# find the reference anchors
print("finding reference anchors")
[1] "finding reference anchors"
MO.anchors <- FindTransferAnchors(reference = MBO ,query = MO.query, dims = 1:30)
Performing PCA on the provided reference using 2000 features as input.
Projecting cell embeddings
Finding neighborhoods
Finding anchors
Found 1965 anchors
Filtering anchors
Retained 588 anchors
# can try a range of dims 20-50
print("getting predictions")
[1] "getting predictions"
predictions <- TransferData(anchorset = MO.anchors, refdata = MBO$cluster_labels)
Finding integration vectors
Finding integration vector weights
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Predicting cell labels
MO.query <- AddMetaData(MO.query, metadata = predictions)
print(table(MO.query$predicted.id))
Epithelial Neural Precursors Neurons-DA Neurons-e Oligodendrocytes
1152 396 943 13239 59074
RGa RGd1 RGd2
14 17776 50
See the predictions
DimPlot(MO.query, reduction = "umap", group.by = 'predicted.id')
NA
NA
Some heatmaps
See the usual marker lists - by sorted population, by clusters
Markers.glia1Vs2 <- FindMarkers(all.ft, ident.1 = c('Glia1'), ident.2 = c('Glia2'))
| | 0 % ~calculating
|+ | 1 % ~01m 12s
|+ | 2 % ~01m 12s
|++ | 3 % ~01m 11s
|++ | 4 % ~01m 10s
|+++ | 5 % ~01m 09s
|+++ | 6 % ~01m 08s
|++++ | 7 % ~01m 08s
|++++ | 8 % ~01m 07s
|+++++ | 9 % ~01m 07s
|+++++ | 10% ~01m 06s
|++++++ | 11% ~01m 04s
|++++++ | 12% ~01m 03s
|+++++++ | 13% ~01m 02s
|+++++++ | 14% ~01m 01s
|++++++++ | 15% ~59s
|++++++++ | 16% ~59s
|+++++++++ | 17% ~58s
|+++++++++ | 18% ~57s
|++++++++++ | 19% ~56s
|++++++++++ | 20% ~55s
|+++++++++++ | 21% ~54s
|+++++++++++ | 22% ~53s
|++++++++++++ | 23% ~53s
|++++++++++++ | 24% ~52s
|+++++++++++++ | 25% ~51s
|+++++++++++++ | 26% ~51s
|++++++++++++++ | 27% ~50s
|++++++++++++++ | 28% ~49s
|+++++++++++++++ | 29% ~49s
|+++++++++++++++ | 30% ~48s
|++++++++++++++++ | 31% ~47s
|++++++++++++++++ | 32% ~47s
|+++++++++++++++++ | 33% ~46s
|+++++++++++++++++ | 34% ~45s
|++++++++++++++++++ | 35% ~45s
|++++++++++++++++++ | 36% ~44s
|+++++++++++++++++++ | 37% ~43s
|+++++++++++++++++++ | 38% ~42s
|++++++++++++++++++++ | 39% ~42s
|++++++++++++++++++++ | 40% ~41s
|+++++++++++++++++++++ | 41% ~41s
|+++++++++++++++++++++ | 42% ~40s
|++++++++++++++++++++++ | 43% ~39s
|++++++++++++++++++++++ | 44% ~38s
|+++++++++++++++++++++++ | 45% ~38s
|+++++++++++++++++++++++ | 46% ~37s
|++++++++++++++++++++++++ | 47% ~36s
|++++++++++++++++++++++++ | 48% ~36s
|+++++++++++++++++++++++++ | 49% ~35s
|+++++++++++++++++++++++++ | 50% ~34s
|++++++++++++++++++++++++++ | 51% ~34s
|++++++++++++++++++++++++++ | 52% ~33s
|+++++++++++++++++++++++++++ | 53% ~32s
|+++++++++++++++++++++++++++ | 54% ~32s
|++++++++++++++++++++++++++++ | 55% ~31s
|++++++++++++++++++++++++++++ | 56% ~30s
|+++++++++++++++++++++++++++++ | 57% ~30s
|+++++++++++++++++++++++++++++ | 58% ~29s
|++++++++++++++++++++++++++++++ | 59% ~28s
|++++++++++++++++++++++++++++++ | 60% ~28s
|+++++++++++++++++++++++++++++++ | 61% ~27s
|+++++++++++++++++++++++++++++++ | 62% ~26s
|++++++++++++++++++++++++++++++++ | 63% ~26s
|++++++++++++++++++++++++++++++++ | 64% ~25s
|+++++++++++++++++++++++++++++++++ | 65% ~24s
|+++++++++++++++++++++++++++++++++ | 66% ~24s
|++++++++++++++++++++++++++++++++++ | 67% ~23s
|++++++++++++++++++++++++++++++++++ | 68% ~22s
|+++++++++++++++++++++++++++++++++++ | 69% ~21s
|+++++++++++++++++++++++++++++++++++ | 70% ~21s
|++++++++++++++++++++++++++++++++++++ | 71% ~20s
|++++++++++++++++++++++++++++++++++++ | 72% ~19s
|+++++++++++++++++++++++++++++++++++++ | 73% ~19s
|+++++++++++++++++++++++++++++++++++++ | 74% ~18s
|++++++++++++++++++++++++++++++++++++++ | 75% ~17s
|++++++++++++++++++++++++++++++++++++++ | 76% ~17s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~16s
|+++++++++++++++++++++++++++++++++++++++ | 78% ~15s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~14s
|++++++++++++++++++++++++++++++++++++++++ | 80% ~14s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~13s
|+++++++++++++++++++++++++++++++++++++++++ | 82% ~12s
|++++++++++++++++++++++++++++++++++++++++++ | 83% ~12s
|++++++++++++++++++++++++++++++++++++++++++ | 84% ~11s
|+++++++++++++++++++++++++++++++++++++++++++ | 85% ~10s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~10s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~09s
|++++++++++++++++++++++++++++++++++++++++++++ | 88% ~08s
|+++++++++++++++++++++++++++++++++++++++++++++ | 89% ~08s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~07s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~06s
|++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~06s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 93% ~05s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~04s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~03s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~03s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~02s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 98% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=01m 09s
# Compare Neurons1 vs Neurons2
Markers.neurons1vs2 <- FindMarkers(all.ft, ident.1 = c('Neurons1'), ident.2 = c('Neurons2'))
For a more efficient implementation of the Wilcoxon Rank Sum Test,
(default method for FindMarkers) please install the limma package
--------------------------------------------
install.packages('BiocManager')
BiocManager::install('limma')
--------------------------------------------
After installation of limma, Seurat will automatically use the more
efficient implementation (no further action necessary).
This message will be shown once per session
| | 0 % ~calculating
|+ | 1 % ~17s
|++ | 3 % ~17s
|++ | 4 % ~17s
|+++ | 5 % ~17s
|++++ | 6 % ~16s
|++++ | 8 % ~16s
|+++++ | 9 % ~16s
|++++++ | 10% ~16s
|++++++ | 12% ~16s
|+++++++ | 13% ~16s
|++++++++ | 14% ~15s
|++++++++ | 16% ~15s
|+++++++++ | 17% ~15s
|++++++++++ | 18% ~15s
|++++++++++ | 19% ~15s
|+++++++++++ | 21% ~14s
|++++++++++++ | 22% ~14s
|++++++++++++ | 23% ~14s
|+++++++++++++ | 25% ~14s
|+++++++++++++ | 26% ~14s
|++++++++++++++ | 27% ~13s
|+++++++++++++++ | 29% ~13s
|+++++++++++++++ | 30% ~13s
|++++++++++++++++ | 31% ~13s
|+++++++++++++++++ | 32% ~12s
|+++++++++++++++++ | 34% ~12s
|++++++++++++++++++ | 35% ~12s
|+++++++++++++++++++ | 36% ~12s
|+++++++++++++++++++ | 38% ~11s
|++++++++++++++++++++ | 39% ~11s
|+++++++++++++++++++++ | 40% ~11s
|+++++++++++++++++++++ | 42% ~11s
|++++++++++++++++++++++ | 43% ~11s
|+++++++++++++++++++++++ | 44% ~10s
|+++++++++++++++++++++++ | 45% ~10s
|++++++++++++++++++++++++ | 47% ~10s
|+++++++++++++++++++++++++ | 48% ~10s
|+++++++++++++++++++++++++ | 49% ~09s
|++++++++++++++++++++++++++ | 51% ~09s
|++++++++++++++++++++++++++ | 52% ~09s
|+++++++++++++++++++++++++++ | 53% ~09s
|++++++++++++++++++++++++++++ | 55% ~08s
|++++++++++++++++++++++++++++ | 56% ~08s
|+++++++++++++++++++++++++++++ | 57% ~08s
|++++++++++++++++++++++++++++++ | 58% ~08s
|++++++++++++++++++++++++++++++ | 60% ~07s
|+++++++++++++++++++++++++++++++ | 61% ~07s
|++++++++++++++++++++++++++++++++ | 62% ~07s
|++++++++++++++++++++++++++++++++ | 64% ~07s
|+++++++++++++++++++++++++++++++++ | 65% ~06s
|++++++++++++++++++++++++++++++++++ | 66% ~06s
|++++++++++++++++++++++++++++++++++ | 68% ~06s
|+++++++++++++++++++++++++++++++++++ | 69% ~06s
|++++++++++++++++++++++++++++++++++++ | 70% ~05s
|++++++++++++++++++++++++++++++++++++ | 71% ~05s
|+++++++++++++++++++++++++++++++++++++ | 73% ~05s
|++++++++++++++++++++++++++++++++++++++ | 74% ~05s
|++++++++++++++++++++++++++++++++++++++ | 75% ~05s
|+++++++++++++++++++++++++++++++++++++++ | 77% ~04s
|+++++++++++++++++++++++++++++++++++++++ | 78% ~04s
|++++++++++++++++++++++++++++++++++++++++ | 79% ~04s
|+++++++++++++++++++++++++++++++++++++++++ | 81% ~04s
|+++++++++++++++++++++++++++++++++++++++++ | 82% ~03s
|++++++++++++++++++++++++++++++++++++++++++ | 83% ~03s
|+++++++++++++++++++++++++++++++++++++++++++ | 84% ~03s
|+++++++++++++++++++++++++++++++++++++++++++ | 86% ~03s
|++++++++++++++++++++++++++++++++++++++++++++ | 87% ~02s
|+++++++++++++++++++++++++++++++++++++++++++++ | 88% ~02s
|+++++++++++++++++++++++++++++++++++++++++++++ | 90% ~02s
|++++++++++++++++++++++++++++++++++++++++++++++ | 91% ~02s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 92% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++ | 94% ~01s
|++++++++++++++++++++++++++++++++++++++++++++++++ | 95% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 96% ~01s
|+++++++++++++++++++++++++++++++++++++++++++++++++ | 97% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 99% ~00s
|++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=18s
Check cell types using EnrichR